{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 模型评估\n",
    "\n",
    "在本 Notebook 中，我们将：\n",
    "\n",
    "1. 加载测试数据和已训练好的模型\n",
    "2. 生成模型的预测结果\n",
    "3. 计算并展示模型的准确率、分类报告和混淆矩阵\n",
    "4. 保存混淆矩阵和分类报告的图片\n",
    "\n",
    "这是模型评估的最后一步，帮助我们了解模型在测试数据上的性能。\n"
   ],
   "id": "bfcfd6078ecee7f5"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# 导入必要的库\n",
    "import os\n",
    "import pandas as pd\n",
    "import joblib\n",
    "from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# 定义前 30 个重要特征的列表\n",
    "TOP_30_FEATURES = [\n",
    "    'Packet Length Variance', 'Packet Length Std', 'Max Packet Length', 'Subflow Bwd Bytes',\n",
    "    'Avg Bwd Segment Size', 'Average Packet Size', 'Packet Length Mean', 'Bwd Packet Length Max',\n",
    "    'Total Length of Bwd Packets', 'Bwd Packet Length Std', 'Init_Win_bytes_backward',\n",
    "    'Flow Bytes/s', 'Min Packet Length', 'Bwd Packet Length Mean', 'Flow IAT Max',\n",
    "    'Total Length of Fwd Packets', 'Fwd Packet Length Mean', 'Flow Duration', 'Flow IAT Min',\n",
    "    'Init_Win_bytes_forward', 'act_data_pkt_fwd', 'Total Fwd Packets', 'Fwd Header Length',\n",
    "    'Subflow Fwd Bytes', 'ACK Flag Count', 'Avg Fwd Segment Size', 'Flow IAT Std',\n",
    "    'Bwd IAT Mean', 'Fwd IAT Min', 'Idle Max'\n",
    "]\n"
   ],
   "id": "7f49747a0d430d7"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 单元格解释：\n",
    "在此单元格中，我们导入了需要的库，包括 `pandas`、`joblib`、`matplotlib`、`seaborn` 以及 `sklearn` 的评估指标工具。`TOP_30_FEATURES` 列表包含前 30 个重要特征，我们将在加载测试数据时使用这些特征进行模型评估。\n"
   ],
   "id": "c53ed9e4f3ad8502"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# 加载测试数据和训练好的模型\n",
    "data_dir = '../data/processed'  # 数据目录\n",
    "model_path = '../models/random_forest_model_top30.pkl'  # 已训练好的模型路径\n",
    "output_dir = '../results'  # 输出结果保存目录\n",
    "\n",
    "# 加载测试数据并选择前 30 个特征\n",
    "X_test = pd.read_csv(os.path.join(data_dir, 'X_test.csv'))[TOP_30_FEATURES]\n",
    "y_test = pd.read_csv(os.path.join(data_dir, 'y_test.csv')).values.ravel()\n",
    "\n",
    "# 加载训练好的模型\n",
    "model = joblib.load(model_path)\n"
   ],
   "id": "83f5aebe1612a108"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 单元格解释：\n",
    "这个单元格加载了测试数据和训练好的随机森林模型。我们从 `X_test.csv` 中选择前 30 个重要特征的数据，并加载 `y_test.csv` 作为测试集标签。随后，我们使用 `joblib.load` 函数加载预先保存的模型文件。\n"
   ],
   "id": "f122ddc9e0b28aaf"
  },
  {
   "cell_type": "code",
   "id": "initial_id",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# 生成预测结果并计算准确率、分类报告和混淆矩阵\n",
    "y_pred = model.predict(X_test)  # 生成预测结果\n",
    "accuracy = accuracy_score(y_test, y_pred)  # 计算准确率\n",
    "report = classification_report(y_test, y_pred)  # 生成分类报告\n",
    "cm = confusion_matrix(y_test, y_pred)  # 生成混淆矩阵\n"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 单元格解释：\n",
    "在此单元格中，我们使用训练好的模型对 `X_test` 进行预测，并生成预测结果 `y_pred`。然后，我们计算模型的准确率 `accuracy`，生成分类报告 `report` 和混淆矩阵 `cm`。这些评估指标帮助我们全面了解模型在测试集上的表现。\n"
   ],
   "id": "87f56986e9f8ee4c"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# 保存混淆矩阵和分类报告图片\n",
    "os.makedirs(output_dir, exist_ok=True)  # 创建结果保存目录（若不存在）\n",
    "\n",
    "# 绘制混淆矩阵\n",
    "plt.figure(figsize=(10, 7))\n",
    "sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')\n",
    "plt.xlabel('Predicted')\n",
    "plt.ylabel('Actual')\n",
    "plt.title('Confusion Matrix')\n",
    "plt.savefig(os.path.join(output_dir, 'confusion_matrix.png'))  # 保存混淆矩阵图片\n",
    "plt.show()\n",
    "\n",
    "# 绘制分类报告\n",
    "plt.figure(figsize=(10, 7))\n",
    "plt.text(0.01, 0.05, str(report), fontproperties='monospace')  # 将分类报告作为文本显示\n",
    "plt.axis('off')\n",
    "plt.title('Classification Report')\n",
    "plt.savefig(os.path.join(output_dir, 'classification_report.png'))  # 保存分类报告图片\n",
    "plt.show()\n",
    "\n",
    "print(\"模型评估完成\")\n"
   ],
   "id": "c9dd9fcce09c4020"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 单元格解释：\n",
    "这个单元格保存并展示混淆矩阵和分类报告图片：\n",
    "\n",
    "1. 使用 `seaborn` 库的 `heatmap` 函数绘制混淆矩阵，并保存为 `confusion_matrix.png`。\n",
    "2. 将分类报告转换为文本格式显示，并保存为 `classification_report.png`。\n",
    "\n",
    "通过可视化这些结果，我们可以更直观地评估模型的性能。最后打印 \"模型评估完成\" 以确认整个流程的结束。\n"
   ],
   "id": "d54e4e3f6c2eb420"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
